Search CORE

5 research outputs found

Distributional semantics and machine learning for statistical machine translation

Author: Artetxe Zurutuza Mikel
Publication venue
Publication date: 17/05/2016
Field of study

[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.[EN]In this work, we explore the use of distributional semantics and machine learning to improve statistical machine translation. For that purpose, we propose the use of a logistic regression based machine learning model for dynamic phrase translation probability mod- eling. We prove that the proposed model can be seen as a generalization of the standard translation probabilities used in statistical machine translation, and use it to incorporate context and distributional semantic information through lexical, word cluster and word embedding features. Apart from that, we explore the use of word embeddings for phrase translation probability scoring as an alternative approach to incorporate distributional semantic knowledge into statistical machine translation. Our experiments show the effectiveness of the proposed models, achieving promising results over a strong baseline. At the same time, our work makes important contributions in relation to bilingual word embedding mappings and word embedding based phrase similarity measures, which go be- yond machine translation and have an intrinsic value in the field of distributional semantics

Archivo Digital para la Docencia y la Investigación

Itzulpen-sistema hibridoen eraikuntza EBMT bidezko itzulpen partzialak erabiliz

Author: Artetxe Zurutuza Mikel
Publication venue
Publication date: 01/07/2014
Field of study

Proiektu honetan EBMT tekniken bidez itzulpen partzialak sortzen dituen aurreprozesu batean oinarritutako itzulpen automatikorako hibridazio-mekanismo bat garatu da, entitateen eta esaldia baino txikiagoak diren unitate sintaktikoen bidezko orokortzea darabilena. Sistema oso arina eta eskalagarria izan dadin diseinatua izan da, eta inplementazio modular, hedagarri eta eraginkor bat eskaini zaio, baliabide eta tresna ugarirekin integratuz. Egindako esperimentuetan oso emaitza positiboak eskuratu dira, proposatutako sistemak abiapuntukoarekiko hobekuntza nabarmenak ekar ditzakeela erakusten dutenak

Archivo Digital para la Docencia y la Investigación

Distributional semantics and machine learning for statistical machine translation

Author: Artetxe Zurutuza Mikel
Publication venue
Publication date: 01/01/2016
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Itzulpen-sistema hibridoen eraikuntza EBMT bidezko itzulpen partzialak erabiliz

Author: Artetxe Zurutuza Mikel
Publication venue
Publication date: 01/07/2014
Field of study

Archivo Digital para la Docencia y la Investigación

Multilingual machine translation: Closing the gap between shared and language-specific encoder-decoders

Author: Artetxe Zurutuza Mikel
Escolano Peinado Carlos
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific encoder-decoders, and can thus be more easily extended to new languages by learning their corresponding modules. So as to encourage a common interlingua representation, we simultaneously train the N initial languages. Our experiments show that the proposed approach outperforms the universal encoder-decoder by 3.28 BLEU points on average, while allowing to add new languages without the need to retrain the rest of the modules. All in all, our work closes the gap between shared and language-specific encoderdecoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.This work is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 947657).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC